LLM pipeline implementation #1040

mohitmundhragithub · 2025-08-26T05:41:43Z

No description provided.

…included

…re pipeline cannot handle an input size larger than the max prefill size

github-actions · 2025-08-26T05:41:51Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…lemented performance benchmark for LLM pipeline

…y input and issue_query only handles output tokens

… vs KV cache size

flutter/cpp/backends/external.h

mohitmundhragithub

.

flutter/cpp/datasets/mmlu_gen.cc

…xnnpack

sonarqubecloud · 2025-10-31T07:51:06Z

Quality Gate passed

Issues
77 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.6% Duplication on New Code

See analysis details on SonarQube Cloud

farook-edev · 2025-10-31T09:05:00Z

Regarding IOS CI issue, the problem was in 2 parts:

Eigen enabled exceptions regardless of fno-exceptions
Tensorflow -> XNNPack -> FP16 -> math.h was incompatible with x86_64 simulator

1 was resolved by adding a patch that -for the time being- creates a macro that forces exceptions to be off. This macro is enabled only for IOS builds
NOTE: This will be unnecessary once Tensorflow is updated, since newer versions seem to drop Eigen as a dependency

2 was resolved by getting the same version of Tensorflow 2.18.0->XNNPack->FP16 and applying a patch that removes math.h as a dependency.
NOTE: THIS WILL REDUCE IOS PERFORMANCE according to some comments on github, and will be unnecessary once XNNPack is upgraded, since XNNPack reportedly dropped FP16 dependency early this year.

freedomtan · 2025-11-04T05:42:31Z

@anhappdev please help check if it's safe to merge this into the master branch.

freedomtan · 2025-11-04T05:56:02Z

Regarding IOS CI issue, the problem was in 2 parts:

Eigen enabled exceptions regardless of fno-exceptions

Tensorflow -> XNNPack -> FP16 -> math.h was incompatible with x86_64 simulator

1 was resolved by adding a patch that -for the time being- creates a macro that forces exceptions to be off. This macro is enabled only for IOS builds NOTE: This will be unnecessary once Tensorflow is updated, since newer versions seem to drop Eigen as a dependency

2 was resolved by getting the same version of Tensorflow 2.18.0->XNNPack->FP16 and applying a patch that removes math.h as a dependency. NOTE: THIS WILL REDUCE IOS PERFORMANCE according to some comments on github, and will be unnecessary once XNNPack is upgraded, since XNNPack reportedly dropped FP16 dependency early this year.

@farook-edev I’m not quite sure why there’s a compatibility issue with math.h. As far as I understand, math.h should be ANSI C and POSIX compatible, so it shouldn’t have anything to do with 16-bit floating-point operations since it existed before the fp16 standard became popular.

Anyway, could we upgrade TensorFlow? Which version should we use? @anhappdev

freedomtan · 2025-11-04T06:16:04Z

@anhappdev let's create a submission-v6.0 based on this and keep upcoming updates/PRs to that branch.

farook-edev · 2025-11-04T06:18:13Z

cat mlperf_log_summary.txt
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 53819307167
90th first token percentile latency (ns) : 42895741078
Result is : INVALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: NO
Recommendations:
 * The test exited early, before enough queries were issued.
   See the detailed log for why this may have occurred.
TTFT Early Stopping Result:

TPOT Early Stopping Result:
 * Only processed 8 queries.
 * Need to process at least 64 queries for early stopping.

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 0.02
QPS w/o loadgen overhead        : 0.02

Min latency (ns)                : 41576831338
Max latency (ns)                : 53819307167
Mean latency (ns)               : 45227442899
50.00 percentile latency (ns)   : 45318612899
90.00 percentile latency (ns)   : 53819307167
95.00 percentile latency (ns)   : 53819307167
97.00 percentile latency (ns)   : 53819307167
99.00 percentile latency (ns)   : 53819307167
99.90 percentile latency (ns)   : 53819307167

TPS w/ loadgen overhead         : 0.05
TPS w/o loadgen overhead        : 0.04
Min First Token latency (ns)                : 32835322331
Max First Token latency (ns)                : 42895741078
Mean First Token latency (ns)               : 36094364693
50.00 percentile first token latency (ns)   : 36311724777
90.00 percentile first token latency (ns)   : 42895741078
95.00 percentile first token latency (ns)   : 42895741078
97.00 percentile first token latency (ns)   : 42895741078
99.00 percentile first token latency (ns)   : 42895741078
99.90 percentile first token latency (ns)   : 42895741078

Min Time to Output Token (ns)                : 8396929372
Max Time to Output Token (ns)                : 10923566089
Mean Time to Output Token (ns)               : 9133078206
50.00 percentile time to output token (ns)   : 8970371559
90.00 percentile time to output token (ns)   : 10923566089
95.00 percentile time to output token (ns)   : 10923566089
97.00 percentile time to output token (ns)   : 10923566089
99.00 percentile time to output token (ns)   : 10923566089
99.90 percentile time to output token (ns)   : 10923566089

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
ttft_latency (ns): 100000000
tpot_latency (ns): 100000000
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 1

No warnings encountered during test.

No errors encountered during test.

@freedomtan

freedomtan · 2025-11-04T06:25:28Z

@anhappdev and @freedomtan to check if the number of samples is one.

@farook-edev please upgrade the version of loadgen first.

freedomtan · 2025-11-04T06:34:49Z

mlperf_client testing method.

@Mostelk different input/output sizes appear to be to much of mobile environment, so let's have a simple configuration.

anhappdev · 2025-11-04T08:35:08Z

Anyway, could we upgrade TensorFlow? Which version should we use? @anhappdev

Regarding TensorFlow version. If possible we should use the latest available v2.20.0 because it work with Bazel 7, which will resolve the issue with iOS build on macOS 26. TensorFlow v2.19 and earlier still use Bazel 6.

anhappdev · 2025-11-04T11:37:03Z

@farook-edev Could you build the Android app on macOS? I'm getting this error:

ERROR: /Users/anh/dev/mlcommons/mobile_app_open/mobile_back_tflite/cpp/backend_tflite/BUILD:110:18: Linking mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so failed: (Exit 1): clang failed: error executing command (from target //mobile_back_tflite/cpp/backend_tflite:libtflitebackend.so) 
  (cd /private/var/tmp/_bazel_anh/30b0ae0ebfc82d789f6eeabcb52f979d/execroot/mlperf_app && \
  exec env - \
    PATH='/Users/anh/Library/Caches/bazelisk/downloads/bazelbuild/bazel-6.3.2-darwin-arm64/bin:/Users/anh/sdk/venv/venv_p39/bin:/opt/homebrew/opt/node@18/bin:/Users/anh/cache/pub-cache/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin/cache/dart-sdk/bin:/Users/anh/sdk/Android/sdk/platform-tools:/Users/anh/sdk/Java/openjdk-17.0.1/Contents/Home/bin:/opt/homebrew/opt/ruby/bin:/opt/homebrew/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/Library/TeX/texbin:/Users/anh/sdk/venv/venv_p39/bin:/opt/homebrew/opt/node@18/bin:/Users/anh/cache/pub-cache/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin:/Users/anh/sdk/Flutter/flutter_3.19.6/bin/cache/dart-sdk/bin:/Users/anh/sdk/Android/sdk/platform-tools:/Users/anh/sdk/Java/openjdk-17.0.1/Contents/Home/bin:/opt/homebrew/opt/ruby/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/Users/anh/Library/Application Support/JetBrains/Toolbox/scripts:/Users/anh/Library/Application Support/JetBrains/Toolbox/scripts' \
    PWD=/proc/self/cwd \
  external/androidndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang @bazel-out/arm64-v8a-opt/bin/mobile_back_tflite/cpp/backend_tflite/libtflitebackend.so-2.params)
# Configuration: 4a3d5e0ee256349d5cd18cd0b151ba6eaa79e5b36b30cd73291980202fce22d3
# Execution platform: @local_execution_config_platform//:platform
ld.lld: error: unknown argument '-framework'
ld.lld: error: cannot open CoreFoundation: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The iOS build works fine on macOS, though.

farook-edev added 9 commits August 2, 2025 08:11

WIP LLM pipeline and dataset implementation

46346a7

fixed issues preventing libraries from compiling, runtime errors not …

5aab20a

…included

upgrade TensorFlow to 2.18.0

f598e57

upgraded llm pipeline to use TFLite C++ api + small bug fixes

fe32950

basic flutter app support for icon and dataset

24ad1d5

added linux x86_64 config for internal testing

aa09439

updated bazel config to use SSE/MMX instructions

84b164e

fixed incorrect answer format and compression

d57040c

got pipeline and dataset to produce proper results + fixed issues whe…

f9e40a5

…re pipeline cannot handle an input size larger than the max prefill size

mohitmundhragithub requested review from a team and anhappdev as code owners August 26, 2025 05:41

mohitmundhragithub assigned farook-edev Aug 26, 2025

mohitmundhragithub requested review from Mostelk and freedomtan August 26, 2025 05:42

mohitmundhragithub marked this pull request as draft August 26, 2025 05:42

farook-edev added 2 commits September 1, 2025 07:07

added support for loadgen's token based performance measurement + imp…

057c9f8

…lemented performance benchmark for LLM pipeline

fixed bugs in inference process, first token function now handles onl…

3c8b4f5

…y input and issue_query only handles output tokens

freedomtan mentioned this pull request Sep 2, 2025

Master issue: LLM Benchmark #940

Open

farook-edev changed the title ~~Feat llm~~ LLM pipeline implementation Sep 2, 2025

farook-edev linked an issue Sep 2, 2025 that may be closed by this pull request

Master issue: LLM Benchmark #940

Open

farook-edev added 5 commits September 8, 2025 00:54

optimized tensor retrieval for inference + added check for input size…

a03fbea

… vs KV cache size

clang-format

69a630a

mmlu dataset cleanup and formatting

816f282

slight code cleanup

fca2905

fixed issue with genai ops import

20e7805

mohitmundhragithub commented Sep 22, 2025

View reviewed changes

flutter/cpp/backends/external.h Outdated Show resolved Hide resolved

mohitmundhragithub commented Sep 22, 2025

View reviewed changes

flutter/cpp/datasets/mmlu_gen.cc Show resolved Hide resolved

code/config cleanup

83aea46

farook-edev added 14 commits October 30, 2025 09:50

use arm64 simulator for tflite on IOS

744259d

set ios cpu argument for cpuinfo

94cf6d6

remvoed ios_sim prefix

85c8b2d

attempt at using arm64 simulator for IOS instead of x86

bffdcd5

attempt to force flutter to build ITs for arm64 only

7b7f30d

force arm64 for pods

f755765

disable f16 instead of building for arm64

1e7d5a0

more bazel config lines to disable fp16

0482013

removed unavailable compiler flags

5dd0383

provide patched fp16 lib with math workaround

142f306

typo

af80d67

added patch arg

674ffde

created a math workaround patch compatible with fp16 version used by …

36f4602

…xnnpack

datasets now provide token limits as inputs to pipeline

64066a4

farook-edev marked this pull request as ready for review October 31, 2025 08:56

anhappdev changed the base branch from master to submission-v6.0 November 4, 2025 07:27

anhappdev approved these changes Nov 4, 2025

View reviewed changes

anhappdev merged commit bd890dd into submission-v6.0 Nov 4, 2025
30 checks passed

anhappdev deleted the feat-llm branch November 4, 2025 07:28

github-actions bot locked and limited conversation to collaborators Nov 4, 2025

LLM pipeline implementation #1040

LLM pipeline implementation #1040

Uh oh!

Conversation

mohitmundhragithub commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mohitmundhragithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 31, 2025

Quality Gate passed

Uh oh!

farook-edev commented Oct 31, 2025

Uh oh!

freedomtan commented Nov 4, 2025

Uh oh!

freedomtan commented Nov 4, 2025

Uh oh!

freedomtan commented Nov 4, 2025

Uh oh!

farook-edev commented Nov 4, 2025

Uh oh!

freedomtan commented Nov 4, 2025

Uh oh!

freedomtan commented Nov 4, 2025

Uh oh!

Uh oh!

anhappdev commented Nov 4, 2025

Uh oh!

anhappdev commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Aug 26, 2025 •

edited

Loading